October 16, 2017

Goals

  • Intro to geospatial data & coordinate reference systems

  • Load spatial data in R

  • Explore and map the data

  • Basic transformations

  • Practice

Before we begin

  1. Download the workshop files

  2. Install any required libraries

Creating Maps in R

Who are you?

Why are you here?

Geographic Data

Geographic Data

Data about locations on or near the surface of the Earth.

Place names

Convey geographic information but don't specify location on the surface of the Earth.

Geospatial data

represent location geometrically with coordinates

46.130479, -117.134167

Coordinate Reference System

Coordinates indicate specific locations on the Earth when associated with a geographic coordinate reference system or CRS.

Geographic CRS

Specifies

  1. the shape of the Earth (major & minor polar axis)
  2. the origin (equator, prime meridian)
  3. fit of the CRS to the Earth (center of the earth)
  4. units (lat/lon expressed as decimal degrees or dms)

Because of variations in 1-3, there are many geographic CRSs!

The World Geodetic System of 1984 is the default geographic CRS used today.

Map Projections

A Projected CRS applies a map projection to a Geographic CRS

Map Projection: mathematial transformation from curved to flat surface

Projected CRSs

There are many, many projected CRSs

All introduce distortion, eg in shape, area, distance, direction

No best one for all purposes

Selection depends on location, extent and purpose

Different CRSs

Spatial Data

Spatial data is a more generic term that is not just for geographic data.

Spatial data are powerful because

  • dynamically determine spatial metrics like area and length,
  • characteristics like distance and direction, and
  • relationships like inside and intersects from these data.

Types of Spatial Data

Vector Data

Points, lines and Polygons

Raster Data

Regular grid of cells (or pixels)

Geospatial Data in R

Geospatial Data in R

There are many approaches to and packages for working with geospatial data in R.

One approach is to keep it simple and store geospatial data in a data frame

cafes <- read.csv('data/cafes.csv')
head(cafes)
##   X        name      long      lat taste price crowded food
## 1 1     babette -122.2554 37.86843    10  4.00       1    1
## 2 2     musical -122.2607 37.86838     7  3.25       1    1
## 3 3   starbucks -122.2661 37.87044     6  2.95       1    0
## 4 4       yalis -122.2664 37.87353     7  2.95       0    0
## 5 5 berkeleyesp -122.2687 37.87366     3  3.25       1    0
## 6 6     fertile -122.2689 37.87493     5  3.25       0    1

Plot of points

plot(cafes$long,cafes$lat)

Fancy plots with ggplot2 and ggmap

library(ggplot2)
library(ggmap)
ggplot() + geom_point(data=cafes, aes(long,lat), col="red", size=2)

BUT!

There are advantages to storing geographic data as spatial objects.

  • Data cleaning! Before Mapping

  • Better management of complex spatial representations
  • CRS tranformations and geoprocessing

  • Compute spatial metrics and relationships

  • Data linkages - when data are in same CRS

sp Package

The sp Package

Classes and Methods for Spatial Data

The SP package is most commonly used to construct and manipulate spatial data objects in R.

Hundreds of other R packages that do things with spatial data typically build on SP objects.

sf package

The sf, or simple features package in R has many improvements

Based on open standards for specifying spatial data

But most spatial packages still depend on sp

So, live on the bleeding edge or check back in a year or so.

sp package

library(sp)
 
getClass("Spatial") # See all sp object types
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

```

sp Vector Objects

Geometry Spatial Object Spatial Object with Attributes
Points SpatialPoints SpatialPointsDataFrame
Lines SpatialLines SpatialLinesDataFrame
Polygons SpatialPolygons SpatialPolygonsDataFrame

We use the S*DF objects most frequently!

Let's take a look

Read in point Data

rentals <- read.csv('data/sf_airbnb_2bds.csv')
class(rentals)
## [1] "data.frame"
dim(rentals)
## [1] 1206   14

Examine Data Structure

str(rentals)
## 'data.frame':    1206 obs. of  14 variables:
##  $ id                  : int  91957 18139835 172943 2309140 2559297 149108 10832295 15134083 283235 8013423 ...
##  $ name                : Factor w/ 1203 levels " Victorian Suite in Inner Mission",..: 1091 267 917 1128 479 904 349 696 1064 976 ...
##  $ latitude            : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ longitude           : num  -122 -122 -122 -122 -122 ...
##  $ property_type       : Factor w/ 4 levels "Apartment","Condominium",..: 1 1 3 1 1 3 2 1 1 1 ...
##  $ room_type           : Factor w/ 1 level "Entire home/apt": 1 1 1 1 1 1 1 1 1 1 ...
##  $ accommodates        : int  6 4 3 4 4 3 6 4 4 4 ...
##  $ bathrooms           : num  1.5 1 1 1 1 1 1 1 1 2 ...
##  $ bedrooms            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ beds                : int  4 2 2 2 2 2 2 2 2 2 ...
##  $ price               : int  300 275 279 395 235 300 450 349 199 235 ...
##  $ review_scores_rating: int  94 100 100 97 86 100 100 92 96 93 ...
##  $ neighbourhood       : Factor w/ 52 levels "Alamo Square",..: 7 52 7 52 52 7 52 26 52 20 ...
##  $ listing_url         : Factor w/ 1206 levels "https://www.airbnb.com/rooms/1003756",..: 1156 490 446 666 689 293 55 311 711 1073 ...

Examine Data Content

head(rentals)
##         id                                      name latitude longitude
## 1    91957                    Sunny SF Flat with 5*s 37.76194 -122.4493
## 2 18139835 Brand New Apartment in the Heart of NOPA! 37.77568 -122.4453
## 3   172943        Redwood Cottage on Ashbury Heights 37.76022 -122.4495
## 4  2309140                               The Lady Di 37.77524 -122.4394
## 5  2559297       Entire 2BR w/pkg centrally-located! 37.77507 -122.4409
## 6   149108          QUIET LUXURY COLE VALLEY PARKING 37.76101 -122.4507
##   property_type       room_type accommodates bathrooms bedrooms beds price
## 1     Apartment Entire home/apt            6       1.5        2    4   300
## 2     Apartment Entire home/apt            4       1.0        2    2   275
## 3         House Entire home/apt            3       1.0        2    2   279
## 4     Apartment Entire home/apt            4       1.0        2    2   395
## 5     Apartment Entire home/apt            4       1.0        2    2   235
## 6         House Entire home/apt            3       1.0        2    2   300
##   review_scores_rating         neighbourhood
## 1                   94           Cole Valley
## 2                  100 Western Addition/NOPA
## 3                  100           Cole Valley
## 4                   97 Western Addition/NOPA
## 5                   86 Western Addition/NOPA
## 6                  100           Cole Valley
##                             listing_url
## 1    https://www.airbnb.com/rooms/91957
## 2 https://www.airbnb.com/rooms/18139835
## 3   https://www.airbnb.com/rooms/172943
## 4  https://www.airbnb.com/rooms/2309140
## 5  https://www.airbnb.com/rooms/2559297
## 6   https://www.airbnb.com/rooms/149108

Visualize Data

hist(rentals$price)

Process Data

cheap <- subset(rentals, price < 401)
hist(cheap$price)

Process Data Some More

cheap_good <- subset(cheap, review_scores_rating > 98)
hist(cheap$price)

From Data Frame to SpatialPointsDataFrame

Create a SpatialPointsDataFrame

Use the sp::coordinates() method

Requires a vector indicating the x,y columns

SP will create a SpatialPointsData Fram from csv

SpatialPointsDataFrame (SPDF)

#First make a copy
cheap_good_orig <- cheap_good

coordinates(cheap_good) <- c('longitude','latitude')
class(cheap_good)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Compare SPDF to DF

str(cheap_good_orig)
## 'data.frame':    374 obs. of  14 variables:
##  $ id                  : int  18139835 172943 149108 10720392 14594885 15546103 13516315 10058568 1721354 3585034 ...
##  $ name                : Factor w/ 1203 levels " Victorian Suite in Inner Mission",..: 267 917 904 504 448 323 948 314 519 389 ...
##  $ latitude            : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ longitude           : num  -122 -122 -122 -122 -122 ...
##  $ property_type       : Factor w/ 4 levels "Apartment","Condominium",..: 1 3 3 1 2 3 1 1 3 1 ...
##  $ room_type           : Factor w/ 1 level "Entire home/apt": 1 1 1 1 1 1 1 1 1 1 ...
##  $ accommodates        : int  4 3 3 6 4 4 4 4 4 4 ...
##  $ bathrooms           : num  1 1 1 1.5 2 1 2 2 1 1.5 ...
##  $ bedrooms            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ beds                : int  2 2 2 2 2 2 3 3 2 2 ...
##  $ price               : int  275 279 300 250 200 225 350 215 389 200 ...
##  $ review_scores_rating: int  100 100 100 100 99 100 100 100 100 100 ...
##  $ neighbourhood       : Factor w/ 52 levels "Alamo Square",..: 52 7 7 20 52 20 7 7 20 7 ...
##  $ listing_url         : Factor w/ 1206 levels "https://www.airbnb.com/rooms/1003756",..: 490 446 293 41 268 336 200 3 439 776 ...

SPDF

str(cheap_good)

SPDF Slots

You can see from str(cheap_good) that a SPDF object is a collection of slots or components. The key ones are:

  • @data data frame of attributes that describe each location
  • @coords the coordinates for each location
  • @bbox the min and max bounding coordinates
  • @proj4string the coordinate reference system defintion as a string

SPDF Slots

Review the output of each of these:

summary(cheap_good)
head(cheap_good@coords)
head(cheap_good@data)
cheap_good@bbox
cheap_good@proj4string

What's missing

Are all the columns that were present in the DF now in the SPDF?

Is there a slot without data?

What is the CRS of the data?

cheap_good@proj4string # get a CRS object
## CRS arguments: NA

Define a CRS

Defining a CRS to a spatial data object - locates the coordinates on the surface of the Earth

You need to know

  • the CRS for the data
  • how to define the CRS object
  • how to assign it to the spatial data

Define a CRS

Known as defining a projection in ArcGIS

Defining a CRS != Transforming CRS

We will get to transformations soon!

CRS Objects

Need the proj4 string that defines the CRS

  • or it's code!
# Create a CRS object
WGS84_CRS <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0") 

# Set the CRS of the SPDF
proj4string(cheap_good) <- WGS84_CRS

# check it
cheap_good@proj4string
## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Another way

# or use the EPSG code directly
proj4string(cheap_good) <- CRS("+init=epsg:4326") 

# or enter the string
proj4string(cheap_good) <- CRS("+proj=longlat 
                               +ellps=WGS84 +datum=WGS84 +no_defs")  

Define and Assign - Incorrectly?

What happens if we assign the wrong CRS?

# # 36910 is the EPSG code for UTM10 NAD83
proj4string(cheap_good) <- CRS("+init=epsg:26910") 

Define and Assign - Incorrectly?

What happens if we assign the wrong CRS?

Software doesn't care but you need to!

Finding CRS Codes

Common CRSs

Geographic

  • 4326 Geographic, WGS84

  • 4269 Geographic, NAD83 (USA)

Projected

  • 5070 USA Contiguous Albers Equal Area Conic

  • 3310 CA ALbers Equal Area

  • 26910 UTM Zone 10, NAD83 (Northern Cal)

  • 3857 Web Mercator (web maps)

Challenge

Use http://spatialreference.org/ to make an educated guess as to the CRS of these coordinates:

X = 549228.058249, Y = 4176578.444299

Strategy:

  • review the bounding box coordinates for the CRSs referenced by the above codes.

Challenge 2

What are the units for that CRS?

Mapping Spatial Objects

spplot

sp includes a plotting method spplot

You can use it to create great maps but it is very low level

which means, complex, non-intuitive syntax, long code

spplot the Data

spplot(cheap_good)
## Warning in stack.data.frame(df[zcol]): non-vector columns will be ignored

spplot the Data

spplot(cheap_good,"price")

Challenge

Use spplot to create data maps from some of the other columns in the @data slot

Getting help:

?spplot

Examples

spplot(cheap_good,"bathrooms")

spplot(cheap_good,"accommodates")

spplot(cheap_good,"property_type")

spplot(cheap_good,"neighbourhood")

RECAP

Spatial Data File formats

Vector points, lines & polygons:

Raster grids

  • GeoJSON
  • TIFF, JPEG

ESRI Shapefile

This is one of the most, if not the most common spatial vector data file formats.

Old but everywhere!

Gotchas: 2GB limit, 8char column names

Reading in Geospatial Data

There's an R package for that!

rgdal

rgdal is an R port of the powerful and widely used GDAL library.

It is the most commonly used R library for importing and exporting spatial data.

  • OGR: for vector data: readOGR() and writeOGR()

  • GDAL for raster data: readGDAL() and writeGDAL()

rgdal

library(rgdal)
## rgdal: version: 1.2-8, (SVN revision 663)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-4
# See what file types are supported by rgdal drivers
# ogrDrivers()$name

Getting help

gdal.org

`?readOGR

For more info on working with rgdal to load different types of spatial data in R see this excellent tutorial by Zev Ross.

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields
# or
# sfboundary <- readOGR("data","sf_boundary")
# but not
#sfboundary <- readOGR(dsn="data/",layer="sf_boundary")

Check out the data structure

#str(sfboundary)  
summary(sfboundary)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x  542696.6  556659.9
## y 4173563.7 4185088.6
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0]
## Data attributes:
##             NAME      Pop2010         Land_sqmi    
##  San Francisco:1   Min.   :805235   Min.   :46.87  
##                    1st Qu.:805235   1st Qu.:46.87  
##                    Median :805235   Median :46.87  
##                    Mean   :805235   Mean   :46.87  
##                    3rd Qu.:805235   3rd Qu.:46.87  
##                    Max.   :805235   Max.   :46.87

Make a quick plot to check the data

How?

Make a quick plot to check the data

plot(sfboundary)

Take a look at the attribute data

How?

Take a look at the attribute data

head(sfboundary@data)   
##            NAME Pop2010 Land_sqmi
## 0 San Francisco  805235     46.87

Take a look at the CRS info

How?

Take a look at the CRS info

Is this a geographic or projected CRS?

sfboundary@proj4string
## CRS arguments:
##  +proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0

Plot with Rentals

plot(sfboundary)
points(cheap_good, col="red")

## Plot with Rentals

Where are the points?

plot(sfboundary)
points(cheap_good, col="red")

What's Wrong?

Compare the CRSs, are they the same?

proj4string(sfboundary)
proj4string(cheap_good)
proj4string(sfboundary) == proj4string(cheap_good)

Compare the coord data

sfboundary@bbox
##         min       max
## x  542696.6  556659.9
## y 4173563.7 4185088.6
cheap_good@bbox
##                  min        max
## longitude -122.50647 -122.38947
## latitude    37.70738   37.80646

CRS Transformations

CRS Transformations

All geospatial data should have the same CRS.

Geospatial data transformations are super common.

Most common transformation is the CRS.

This is also called a projection transformation, or reprojection.

Transform the CRS

Use sp function spTransform

Requires as input

  • a spatial object to transform with a defined CRS
  • Input spatial object must have a defined CRS
  • a CRS object that indidicates the target CRS

Outputs a new spatial object with coordinate data in the target CRS

Transform sfboundary

sf_lonlat <- spTransform(sfboundary, WGS84_CRS)

spTransform 4 Ways

Use CRS to that of another data layer

sf_lonlat <- spTransform(sfboundary, CRS(proj4string(cheap_good)))

Use CRS string

sf_lonlat <- spTransform(sfboundary, CRS("+proj=longlat +ellps=WGS84 
    +datum=WGS84 +no_defs"))

USE CRS code

sf_lonlat <- spTransform(sfboundary, CRS("+init=epsg:4326"))

Use a CRS object

WGS84_CRS <- CRS(proj4string(cheap_good))
sf_lonlat <- spTransform(sfboundary, WGS84_CRS)

Did it work?

How will we know?

Do the CRSs match?

proj4string(cheap_good) == proj4string(sf_lonlat)

Overlay the data in space

plot(sf_lonlat)
points(cheap_good, col="red")
points(cheap_good[cheap_good$price<100,], col="green", pch=19)

Projections, CRS, oh my!

We want all data in the same CRS

Which one is best?

Line Data

Line Data

Let's read in some line data

In the popular GeoJSON file format

Reading in GeoJSON File

GeoJSON can be tricky!

sf_streets <- readOGR(dsn='data/sf_highways.geojson', layer="OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "data/sf_highways.geojson", layer: "OGRGeoJSON"
## with 246 features
## It has 5 fields

Plot the data

plot(sf_streets)

Examine the data

str(sf_streets)
summary(sf_streets)
  • What is the CRS of the data?
  • Are the data projected?
  • Does the CRS match that of the other layers?

Add all the data to one map

How do we do that?

Map all the data

plot(sf_lonlat)
lines(sf_streets)
points(cheap_good, col="red")
points(cheap_good[cheap_good$price<200,], col="green")

RECAP

  • Read in data in CSV, Shapefile and GeoJSON formats
    • with rgdal::writeOGR
  • Created Spatial{Points, Lines, Polygons}DataFrames

  • Defined CRS with proj4string()

  • Transformed CRS with spTransform()

  • Mapped data with plot() and spplot

  • Mapped multilple geospatial data layers

Data Driven Maps

Data Driven Maps

Also called thematic maps or data maps

Use data values to determine symbology

Use symbology to convey data values / meaning

R packages for Mapping

Lots of them

Let's quickly discuss the most common ones

plot

Pros

  • Quick maps
  • Easy to map multiple layers
  • Works with sp objects

Cons

  • Requires lots of complex code to make pretty

spplot

Pros

  • Quick thematic maps of one layer
  • More config options for sp objects than plot

Cons

  • Pretty maps require lots of complex code

ggplot2 and ggmap

Pros

  • beautiful maps with ggplot2
  • builds on existing R knowledge of ggplot2
  • Some great geospatial data functionality in ggmap

Cons

  • Doesn't work with sp objects
  • Limited support for CRS and spatial operations
    • Expects longitude and latitude coordinates
  • Complex syntax and methods to create great maps

tmap

Pros

  • Quick and powerful maps
  • Works with sp objects
  • Syntax similar to but not as complex as ggplot2
  • Easy to save as static or interactive tmaps

Cons

  • No basemaps in static mode

leaflet

Pros

  • R port of the popular Javascript library

  • Allows one to create interactive maps that can be served on web

  • Highly customizeable!

Cons

  • Highly customizeable!

tmap

tmap

tmap stands for thematic map

Nice maps with less code than the alternatives

Syntax should be familar to ggplot2 users, but simpler

tmap in a Nutshell is a good starting point

tmap

Load the library

library(tmap)
?tmap

US Population by State

Let's explore population data for US states.

The file is data/us_states_pop.shp

Challenge

Use readOGR to load it and plot to view it

Read the Data

uspop <- readOGR("./data", "us_states_pop")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "us_states_pop"
## with 49 features
## It has 4 fields
# Review the Data
#summary(uspop)

Basic Questions

  • What type of data object is usopen

  • How many features does it contain?

  • How many attributes describe those features?

  • What is the CRS?

Quick tmap - qtm

Make a quick plot with default options

qtm(uspop)

qtm of the data values

qtm(uspop,"POPULATION")

tmap Shapes and Thematic Elememts

tmap's flexibility comes in how it intitively allows you to layer spatial data and style the layers by data attributes

Use tm_shape(<sp_object>) to specifiy a geospatial data layer

Add + tm_<element>(...) to style the layer by data values

…and other options for creating a publication ready map

Exploring tmap functionality

?tm_polygons

vignette('tmap-nutshell')

Customizing Shape Elemennts

tm_shape(uspop) + tm_polygons(col="beige", border.col = "red")

Challenge

Customize the map with different fill and outline colors

You can use color names or HEX values

My Map

tm_shape(uspop) + tm_polygons(col="#f6e6a2", border.col = "white")

CRS and Shape

Notice anything odd about shape of USA?

tm_shape(uspop) + tm_polygons(col="#f6e6a2", border.col = "white")

Projected CRSs usually used for Mapping

Setting the CRS Dynamically

Is this better?

tm_shape(uspop, projection="+init=epsg:5070") + tm_polygons(col="#f6e6a2", border.col = "white")

Challenge

Dynamic CRS transformations make operations slow or unsupported.

Transform the uspop data to the USA Contiguous Albers CRS (5070),

saving it as a new SpatialPolygonsDataFrame called uspop_5070

uspop_5070

uspop_5070 <- spTransform(uspop, CRS("+init=epsg:5070"))

Plotting the Transformed Data

What's happening here?

tm_shape(uspop_5070) + tm_polygons(col="#f6e6a2", border.alpha = 0) +
tm_shape(uspop) + tm_borders(col="purple")

Choropleth Maps

Choropleth maps

Color areas by data values

Map states by population

tm_shape(uspop_5070) + tm_polygons(col="POPULATION")

Great Choropleth Maps

  1. Color Palettes

  2. Data Classification

Color

rColorBrewer Package

The R package RColorBrewer is often used to select color palettes.

Read the package help for more info.

?RColorBrewer or see colorbrewer.org

library(RColorBrewer)

# ?RColorBrewer

Exploring Brewer Palettes

You can use the display.brewer function to see the colors in a specific named palette.

For example:

display.brewer.all()

Types of Color Palettes:

  • Qualitative - complementary colors, e.g. pastels, to emphasize different categories

  • Sequential - a range of different shades of the same color (hue) to imply higher to lower ranks or values

  • Divergent - two squential color palettes with a light or grey color in the middle; used to emphasize outliers

Displaying Color Palettes by Type

display.brewer.all(type="qual")
display.brewer.all(type="seq")
display.brewer.all(type="div")

Applying a Color Palette

Let's see how a BuPu (blue-purple) palette works with this data.

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="BuPu")

Challenge

Try a sequential, divergent and qualitative color palette with this dataset.

Set auto.palette.mapping=FALSE and TRUE with the SPECTRAL palette.

  • You can read about this option in ?tm_polygons

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="Spectral", auto.palette.mapping=FALSE)

Data classification

Why Data classification

Important for improving the cartographic display of continuous data

Unclassified maps scale full range of values to color palette

  • Great for exploring trends and outliers,

  • but hard to interpret specific data values.

  • The eye can only differentiate a few colors.

Data classification

You can use a data classification method to reduce the complexity in your mapped data by classifying continuous values into small number of bins, typically 5-7.

Unique symbology - color - is then associated with each bin.

A legend indicates the association, making it easier to interpret data values.

Data Classification Methods

Common methods for binning data into a set of classes include:

  • equal interval: classes the data into bins of equal data ranges, e.g. 10-20, 20-30, 30-40.

  • quantile: classes the data into bins with an equal number of data observations. This is the default for most mapping software.

  • fisher/jenks/natural breaks: classes data into bins that minmizie within group variance and maximize between group variance.

  • standard devation: classes emphasize outliers by classing data into bins based on standard deviations around the mean, eg -2 to + 2 SDs.

Data classification in tmap

tmap uses the classificaiton methods in the classIntervals package

The class method is set by the style parameter in tm_polygons - or other style element, eg tm_symbols

See ?tm_polygons for keyword options

tmap data classification

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="BuPu", style="jenks")

Challenge

Try several of the different classification schemes.

Note how they impact the display of the data and thus communicate different messages

What is the default classification scheme used by tmap?

Number of Classes

In addition to setting the classification method you can set the number of classes, or cuts.

By default this is 5

Explore setting it to 3 or 9 with n=

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="BuPu", style="jenks", n=9)

Cartography

The Art and Science of Map Making

Science:

  • the data, the domain,
  • choice of classification scheme

Art:

  • communication & visualization theory
  • choice of color, shape, size, transparency, weight, context

Challenge

Map popdens instead of POPULATION

Don't use defaults

  • Set color palette and classification style

Faceting

tm_shape(uspop_5070) + tm_polygons(col=c("POPULATION","popdens"), 
                              style=c("jenks","jenks"))

Mapping Points

Sometimes polygons get in the way of what you want to communicate

Why?

What state has the greatest population density?

Polygons as Points

You can use tm_symbols instead of tm_polygons to map the data as points

tm_shape(uspop_5070) + tm_symbols(col="popdens", style="jenks")

Mapping multiple layers

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(col="popdens", style="jenks")

Point Maps

Graduated Color Map when data are classified

Proportional Color Map when they are not

Challenge

Change the marker shape to square

Graduated Symbol Maps

Vary symbol size by data value

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(size="popdens", style="jenks")

Challenge

Take a look at ?tm_symbols and adjust the graduated symbol map

Also update the legend title

Take 2

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(size="popdens", style="sd", size.lim=c(1,500))

Interactive tmaps

Plot vs View Modes

tmap has two modes: plot and view

The plot mode is the default and is for creating static maps.

The view mode is for creating interactive maps using leaflet.

You can switch back and forth with the tm_mode() function

View mode

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(uspop) + tm_polygons(col="POPULATION", style="jenks")

Add Popup Content

?tmap_mode

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(uspop) + tm_polygons(col="POPULATION", style="jenks", popup.vars=c("NAME","POPULATION","popdens"))

Plot Mode

Return to plot mode

tmap_mode('plot')
## tmap mode set to plotting

See vignette("tmap-modes") for more on interactive maps.

tm_save

You can save static and interactive maps with tm_save

See: ?tm_save for details or vignette("tmap-nutshell")

leaflet

Use the R leaflet package for more customized leaflet maps,

See the RStudio Leaflet tutorial.

tm_shape(uspop ) + tm_polygons() + tm_scale_bar(position=c("left","bottom")) + tm_compass() + tm_style_classic()

``` ## Challenge

See ?tm_symbols to see what parameter you would add to make the symbol sizes smaller

See ?tm_lines to see how you would increase the thickness (or weight) of the street lines

See ?tm_polygons to see how you would change the fill and border color and add transparency to the fill

Bonus - what parameter adjusts transparency?

Solution

tm_shape(sf_lonlat) + tm_polygons(col="beige", border.col = "blue") + 
tm_shape(sf_streets) + tm_lines(col="black", lwd = 3) +
tm_shape(sf_streets) + tm_lines(col="white", lwd = 1) +
tm_shape(cheap_good) + tm_symbols(col="red", size = 0.5, alpha=0.5)

Does this work?

If yes, what does it tell you?

tm_shape(sfboundary) + tm_polygons(col="beige") + 
tm_shape(cheap_good) + tm_symbols(col="red")

Category Maps

Mapping categorical (or qualitative) data with

Category Maps

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(shape="property_type")

Mapping Quantitative Data

Using color and size to convey data values

Proportional Symbol Maps

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(size="price")

Challenge

Redo that map - set background color to transparent - set legend title to Price per Night

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(size="price", title.size = "Price per Night", alpha = 0)

Proportional Color Maps

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(col="price")

Critique

What worked?

What didn't?

How to improve?

———————————————————————————————

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(col="price", palette = "YlOrRd")

Challenge

  1. Change the color palette to Blues and Purples (squential)

  2. Change the color palette to Spectral (divergent) using red for higher prices.

Use display.brewer.all and the Examples in ?tm_symbols for help

Hint: you may need to set auto.palette.mapping for Spectral colors

Solution

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(col="price", size=0.5, palette = "BuPU", auto.palette.mapping=FALSE)

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(col="price", size=0.5, palette = "-Spectral", auto.palette.mapping=FALSE)

Order matters!

Layers draw in the order that they are added to the map.

So, should it be

  • points, lines, polygons or
  • polygons, points lines

Order matters!

Data will display on the map in the order it appears in the data frame.

You can sort the data by a column value to control this.

Question: Would you order ascending or descending to better see cheap rentals?

Draw Order

# sort data frame by price
cheap_good <- cheap_good[order(cheap_good$price, decreasing = TRUE),] 

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(col="price", size=0.5, palette = "-Spectral", auto.palette.mapping=FALSE)

tm_shape(sfboundary) + tm_polygons(col="white") + 
tm_shape(cheap_good) + tm_symbols(size="price", style="quantile", border.col="red", alpha=0, title.col="Price per Night")

Compare that to an unclassifed symbol map

tm_shape(sfboundary) + tm_polygons(col="white") + 
tm_shape(cheap_good) + tm_symbols(size="price", border.col="red", alpha=0, title.col="Price per Night", perceptual=T)

Graduated Symbol Maps

tm_shape(sfboundary) + tm_polygons(col="white") + 
tm_shape(cheap_good) + tm_symbols(size="price", style="sd", border.col="red", alpha=0,  title.col="Price per Night")

Challenge

Try some other classification schemes for your graduated symbol map

Note how the affect the display of the data

Graduated Color Maps

Hold size constant and vary color to create a Graduated color map

Choice of color palette is very important

Gradiated Color map

tm_shape(sfboundary) + 
  tm_polygons(col="#fbf9f1") + 
tm_shape(cheap_good) + tm_symbols(col="price", style="jenks", size=.5, 
      palette="Reds", auto.palette.mapping=F, 
      border.alpha=0, alpha=0.75, title.col="Price per Night")

Color and Classification method

Try different classification methods

Note how they change the

tm_shape(sfboundary) + 
  tm_polygons(col="white") + 
tm_shape(cheap_good) + tm_symbols(col="price", style="equal",size=.5, 
      palette="Reds", auto.palette.mapping=F, 
      border.alpha=0, alpha=0.75, title.col="Airbnb 2bd Price")

Choropleth maps

Color areas by data values

The polygon version of the graduated color map

US Population by State

load it in and view it

uspop <- readOGR("./data", "us_states_pop")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "us_states_pop"
## with 49 features
## It has 4 fields
plot(uspop)

tmap customizations

Add - scalebar - north arrow

tm layout styles

tm_shape(uspop) + tm_polygons() + tm_scale_bar(position=c("left","bottom")) + tm_compass() + tm_style_classic()

Try another style

Natural

ALbatross

Beaver

etc…

What happened?

Map it with tmap

tm_shape(uspop) + tm_polygons(col="POPULATION")

Try a different class

tm_shape(uspop) + tm_polygons(col="POPULATION", style="jenks")

Notice anything odd about the shape of the USA?

What's the CRS?

Why might that matter?

Projection Transformation

When larger areas are mapped using geographic coordinates the distortion becomes more noticeable.

So these data are typically transformed to another CRS that better preserves shape.

USA Contiguous Albers Equal Area Conic, EPSG:5070

The Albers equal area conic is the typical projection for historical USGS maps of the lower 48

It being a general-purpose low-distortion compromise for mid-latitude short and wide extents

Preserves relative shape

USA ALbers

uspop_albers <- spTransform(uspop, CRS("+init=epsg:5070"))

tm_shape(uspop_albers) + tm_polygons(col="POPULATION", style="jenks")

Challenge

Map population density instead of population

Discuss

tm_shape(uspop_albers) + tm_polygons(col="popdens", style="jenks")
#vs
tm_shape(uspop_albers) + tm_polygons(col="POPULATION", style="jenks")

Points vs Polygons

Map the states as points not polygons

How?

USA Population

tm_shape(uspop_albers) + tm_polygons(col="white",border.alpha = 0.5)+
tm_shape(uspop_albers) + tm_symbols(col="popdens", style="jenks")

Multivariate

bigmap <- tm_shape(sfboundary) + 
   tm_polygons(col="beige") + 
 tm_shape(cheap_good) + 
    tm_symbols(size="coit_dist", title.size="Distance to Coit Tower (KM)", col="price", title.col="Price", shape="property_type", title.shape="Property Type") +
   tm_layout( legend.bg.color="white",inner.margins=c(.05,.05, .15, .25), title="Airbnb 2 Bedroom Rentals, San Francisco Fall 2017", legend.position=c("right","center"))
 
 bigmap

Simple leaflet map

leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(data = cheap_good, radius = 5, stroke=F,
    color = "purple", fillOpacity = 0.75
  )

Color Palettes

library(RColorBrewer)

display.brewer.all()

Add palette

pal <- colorQuantile("Reds",NULL,5)
leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(
      data = cheap_good,
      radius = 6,
      color = ~pal(price),
      stroke = F, 
      fillOpacity = 0.75
  )

Add popup

popup_content <- cheap_good$name
popup_content <- paste0(popup_content, "<br>Price per night: $", cheap_good$price)
popup_content <- paste0(popup_content, "<br><a href=",cheap_good$listing_url,">More info...</a>")
 
leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(
      data = cheap_good,
      radius = 6,
      color = ~pal(price),
      stroke = F, 
      fillOpacity = 0.75,
      popup = popup_content)
  

The future is sf

What's special about spatial objects?

What do you think this code does?

Think about it

Try it and see

Check the help ?spDist

coit_tower <- c("-122.405837,37.802032") 

cheap_good$coit_dist <- 
  spDistsN1(cheap_good,c(-122.405837,37.802032), longlat = T) 

head(cheap_good@data)

Output code to script

library(knitr)
purl("r-geospatial-workshop-f2017.Rmd", output = "test2.R", documentation = 2)

References

“Visualisation” section of the CRAN Task View # see the Quick-R website's page for graphical parameters.